String

Strings are Python builtins datatype for handling text. They are immutable thus you can not add, remove or updated any character in the string. If you wish to perform these operations than you need to create a new string and assign the existing/new variable name to it.

String is a sequence of characters.

characters


In [ ]:

Escape Characters

an escape character is a character which invokes an alternative interpretation on subsequent characters in a character sequence. An escape character is a particular case of metacharacters.

Table: Escape Characters
Escape sequence Hex value in ASCII Character represented
\a 07 Alert (Beep, Bell) (added in C89)[1]
\b 08 Backspace
\f 0C Formfeed
\n 0A Newline (Line Feed); see notes below
\r 0D Carriage Return
\t 09 Horizontal Tab
\v 0B Vertical Tab
\ 5C Backslash
\' 27 Single quotation mark
\" 22 Double quotation mark
\? 3F Question mark (used to avoid trigraphs)
\nnnnote 1 any The byte whose numerical value is given by nnn interpreted as an octal number
\xhh… any The byte whose numerical value is given by hh… interpreted as a hexadecimal number
\enote 2 1B escape character (some character sets)
\Uhhhhhhhhnote 3 none Unicode code point where h is a hexadecimal digit
\uhhhhnote 4 none Unicode code point below 10000 hexadecimal

String Types

Strings can be classified in 3 categories.

  • Standard String: Standard string is one which executed the escape characters
  • Raw String: Raw Strings on the other hand handle escape characters as normal characters and do not process them
  • formatted string literal or f-string: Is a string literal that is prefixed with f or F. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time. {New in 3.6}

String can be initialized using:

  • With single or double quotes ('', "").
  • On several consecutive lines, provided that it's between three single or double quotes (''' ''', """ """).
  • Without expansion characters (example: s = r '\ n', where s will contain the characters \ and n).

Standard String

Standard string is one in which the escape characters are processed and executed. They are by default unicode strings.

Since Python 3, strings are by default unicode string.


In [1]:
#### Standard String Examples: 
friend = 'Chandu\tNalluri' 
print(friend)


Chandu	Nalluri

In [2]:
manager_details = "# Roshan Musheer:\nExcellent Manager and human being."
print(manager_details)


# Roshan Musheer:
Excellent Manager and human being.

Raw String

Raw Strings on the other hand handle escape characters as normal characters and do not process them

  • Raw String: a = r'Roshan\tMusheer' # Roshan\tMusheer
  • Unicode String: u = u'Björk'

In [7]:
a = r'Roshan\tMusheer'
print(a)


Roshan\tMusheer

In [1]:
path = "C:\new_data\technical_jargons"
print(path)
path = R"C:\new_data\technical_jargons"
print(path)


C:
ew_data	echnical_jargons
C:\new_data\technical_jargons

NOTE: both r and R work the same way

Formatted String Literal or F-String

F-String are prefixed with f or F. These strings may contain replacement fields, which are expressions delimited by curly braces {}. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time. {New in 3.6}

String Operations:

Creation / Assignation


In [16]:
s = 'Camel'
print(id(s))


140100897256984

In [9]:
a = 'Roshan\tMusheer'
print(a)


Roshan	Musheer

Concatenation

String concatenation is a process of joining two or more strings into a single string. As we have already discussed that string is an immutable datatype thus we have to create a new string for concatenation, what that means is the original strings will still remain the same and new one will be created using the texts from the originals.

There are multiple ways in which we can achive the concatenation. The most common method of achiving the concatenation, is to use + operator.

Lets take an example, where we have three string's and lets try to concatenate them using it.


In [21]:
st_the = "The "
st_action = " ran away !!!"
st = st_the + s + st_action
print(st)
print(s)
print(st_the)
print(st_action)
print(id(st))

print(id(st_the))
print(id(s))
print(id(st_action))


The Camel ran away !!!
Camel
The 
 ran away !!!
140100897622392
140100897828624
140100897256984
140100897312688

In [3]:
print(dir(s))


['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

Interpolation

string interpolation (or variable interpolation, variable substitution, or variable expansion) is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values.


In [22]:
print( 'Size of %s => %d' % (s, len(s)))
print(dir(s))
print( 'Size of %s => %d' % (s, s.__len__()))

def size(strdata):
    c = 0
    for a in strdata:
        c+=1
    return c

print(size("Anshu"))


Size of Camel => 5
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
Size of Camel => 5
5

%-formatting

Str.format()

Template Strings

Literal String

It is the new Interpolation method as it is implemented in Python 3.6.


In [3]:
name = 'World'
program = 'Python'
print(f'Hello {name}! This is {program}')
name = 'Ravi'
program = 'Python'
print(f'Hello {name}! This is {program}')


Hello World! This is Python
Hello Ravi! This is Python

In [5]:
# String processed as a sequence
s = "Murthy "
for ch in s: print(ch , end=',') # This 
# print(help(print))
print("\b.")
print("~"*79)


M,u,r,t,h,y, ,.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In [6]:
# Strings are objects
if s.startswith('M'): print(s.upper())

print(s.lower())
print("~"*79)

# what will happen? 
print(3*s) 

# print(dir(s))


MURTHY 
murthy 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Murthy Murthy Murthy 

In [7]:
s = "   Murthy "
age = 5
print(s + str(age))
print(s.strip(), age)
# print(s + age)


   Murthy 5
Murthy 5

In [17]:
st = "    Mayank Johri    "
print(len(st))
s = st.strip()
print(len(s))
print(st.rstrip())
print(st.lstrip())


20
12
    Mayank Johri
Mayank Johri    

In [13]:
m = "Mohan Shah"
x = ["mon", "tues", "wed"]
y = ","
a = "On Leave"
print(y.join(x)) # -> mon,tues,wed
print(m.join(y)) 
print(a.join(y))
print(y.join(a)) 
print(a.join(m))


mon,tues,wed
,
,
O,n, ,L,e,a,v,e
MOn LeaveoOn LeavehOn LeaveaOn LeavenOn Leave On LeaveSOn LeavehOn LeaveaOn Leaveh

Create a string from a list of string items


In [14]:
" ".join(x)


Out[14]:
'mon tues wed'

In [15]:
book_desc = ["This", "book", "is good"]
" ".join(book_desc)


Out[15]:
'This book is good'

The operator % is used for string interpolation. The interpolation is more efficient in use of memory than the conventional concatenation.

Symbols used in the interpolation:

  • %s: string.
  • %d: integer.
  • %o: octal.
  • %x: hexacimal.
  • %f: real.
  • %e: real exponential.
  • %%: percent sign.

Symbols can be used to display numbers in various formats.

Example:


In [20]:
# Zeros left
print ('Now is %02d:%02d.' % (6, 30))

# Real (The number after the decimal point specifies how many decimal digits )
print ('Percent: %.1f%%, Exponencial:%.2e' % (5.333, 0.00314))

# Octal and hexadecimal
print ('Decimal: %d, Octal: %o, Hexadecimal: %x' % (10, 10, 10))


Now is 06:30.
Percent: 5.3%, Exponencial:3.14e-03
Decimal: 10, Octal: 12, Hexadecimal: a

format

In addition to interpolation operator %, the string method and function format() is available.

The function format() can be used only to format one piece of data each time.

Examples:


In [4]:
peoples = [('Mayank', 'friend', 'Manish'),
('Mayank', 'reportee', 'Roshan Musheer')]

# Parameters are identified by order
msg = '{0} is {1} of {2}'

for name, relationship, friend in peoples:
    print(msg.format(name, relationship, friend))


Mayank is friend of Manish
Mayank is reportee of Roshan Musheer

In [10]:
# Parameters are identified by name
msg = '{greeting}, it is {hour:02d}:{minute:02d}'

print(msg.format(greeting='Good Morning', minute=2, hour=10))
print(msg)
# Builtin function format()
print ('Pi =', format(3.14159, '.3e'))
print ('Pi =', format(3.14159, '.1e'))


Good Morning, it is 10:02
{greeting}, it is {hour:02d}:{minute:02d}
Pi = 3.142e+00
Pi = 3.1e+00

>>> TODO !!!

Explain the below examples


In [11]:
'{} {}'.format('सूर्य', 'नमस्कार')


Out[11]:
'सूर्य नमस्कार'

In [41]:
'{1} {0}'.format('सूर्य', 'नमस्कार')


Out[41]:
'नमस्कार सूर्य'

In [23]:
s = '{:>30}'.format('सूर्य नमस्कार')
print(s)
print(len(s))


                 सूर्य नमस्कार
30

In [19]:
s = '{:>2}'.format('सूर्य नमस्कार')
print(s)
print(len(s))


सूर्य नमस्कार
13

In [14]:
'{:20}'.format('सूर्य नमस्कार')


Out[14]:
'सूर्य नमस्कार       '

In [49]:
'{:4}'.format('Bonjour')


Out[49]:
'Bonjour'

In [28]:
'{:^<5}'.format('Ja')


Out[28]:
'Ja^^^'

In [58]:
'{:^7}'.format('こんにちは')


Out[58]:
' こんにちは '

In [ ]:


In [37]:
'{:.5}'.format('Bonjour')


Out[37]:
'Bonjo'

In [38]:
## ??????

In [36]:
s = '{:10.5}'.format('testdd नमस्कार')
print(len(s))
print(s)


10
testd     

In [ ]:
'{:10.5}'.format('Bonjour')

In [106]:
'{:{align}{width}}'.format('Bonjour', align='^', width='9')


Out[106]:
' Bonjour '

In [107]:
'{:.{prec}} = {:.{prec}f}'.format('Bonjour', 2.22, prec=4)


Out[107]:
'Bonj = 2.2200'

In [66]:
'{:d}'.format(1980)


Out[66]:
'1980'

In [67]:
'{:f}'.format(3.141592653589793)


Out[67]:
'3.141593'

In [40]:
'{:2f}'.format(3.141592653589793)


Out[40]:
'3.141593'

In [77]:
'{:04d}'.format(119)


Out[77]:
'0119'

In [68]:
'{:06.2f}'.format(3.141592653589793)


Out[68]:
'003.14'

In [78]:
'{:+d}'.format(119)


Out[78]:
'+119'

In [79]:
'{:+d}'.format(-119)


Out[79]:
'-119'

In [86]:
### Need to find for complex & boolean numbers
## '{:+d+d}'.format(-3 + 2j)

In [89]:
'{:=5d}'.format((- 111))


Out[89]:
'- 111'

In [ ]:


In [90]:
'{: d}'.format(101)


Out[90]:
' 101'

In [ ]:


In [92]:
'{name} {surname}'.format(name='Mayank', surname='Johri')


Out[92]:
'Mayank Johri'

In [ ]:


In [95]:
user = dict(name='Mayank', surname='Johri')
'{u[name]} {u[surname]}'.format(u=user)


Out[95]:
'Mayank Johri'

In [ ]:


In [97]:
lst = list(range(10))
'{l[2]} {l[7]}'.format(l=lst)


Out[97]:
'2 7'

In [ ]:


In [100]:
from datetime import datetime
'{:%Y-%m-%d %H:%M}'.format(datetime(2017, 12, 23, 14, 15))


Out[100]:
'2017-12-23 14:15'

In [ ]:


In [ ]:


In [ ]:


In [31]:
class Yoga(object):

    def __repr__(self):
        return 'सूर्य नमस्कार'

In [35]:
'{0!r} <-> {0!a}'.format(Yoga())


Out[35]:
'सूर्य नमस्कार <-> \\u0938\\u0942\\u0930\\u094d\\u092f \\u0928\\u092e\\u0938\\u094d\\u0915\\u093e\\u0930'

In [ ]:


In [ ]:

str in-build module

Strings implement all of the common sequence operations, along with the additional methods described below.


In [42]:
myStr = "maya Deploy, version: 0.0.3 "

print(myStr.capitalize())
print(myStr.center(60))
print(myStr.center(60, "*"))
print(myStr.center(10, "*"))

print(myStr.count('a'))
print(myStr.count('e'))

print(myStr.endswith('all'))

print(myStr.endswith('.0.3'))
print(myStr.endswith('.0.3 '))

print(myStr.find("g"))
print(myStr.find("e"))


Maya deploy, version: 0.0.3 
                maya Deploy, version: 0.0.3                 
****************maya Deploy, version: 0.0.3 ****************
maya Deploy, version: 0.0.3 
2
2
False
False
True
-1
6

Note: The find() method should be used only if you need to know the position of sub. To check if sub is a substring or not, use the in operator:

checking: substring in main_string : returns true or false


In [45]:
print("ma" in myStr)


True

In [46]:
print("M" in myStr)


False

In [60]:
c = "one"
print(c.isalpha())
c = "1"
print(c.isalpha())


True
False

In [20]:
superscripts = "\u00B2"
five = "\u0A6B"
five_punjabi = "੫"
ten_hindi = "१०"
num_one = "1"
one = "one"
fractions = "\u00BC"

In [15]:
print(superscripts)
print(five)
print(five_punjabi)
print(ten_hindi)
print(num_one)
print(one)
print(fractions)


²
੫
੫
१०
1
one
¼

isdecimal


In [17]:
print(superscripts.isdecimal())
print(five.isdecimal())
print(five_punjabi.isdecimal())
print(ten_hindi.isdecimal())
print(num_one.isdecimal())
print(one.isdecimal())
print(fractions.isdecimal())


False
True
True
True
True
False
False

In [12]:
print("10 ->", "10".isdecimal())
print("10.001".isdecimal())


10 -> True
False

In [13]:
str = u"this 2009";  
print(str.isdecimal())


False

In [ ]:

isdigit


In [23]:
# str.isdigit() (Decimals, Subscripts, Superscripts)
print(superscripts.isdigit())
print(five.isdigit())
print(five_punjabi.isdigit())
print(ten_hindi.isdigit())
print(num_one.isdigit())
print(one.isdigit())
print(fractions.isdigit())


True
True
True
True
True
False
False

In [24]:
print("10".isdigit())
str = u"this 2009";  
print(str.isdigit())

str = u"23443.434";
print(str.isdigit())


True
False
False

str.isnumeric

  • Digits,
  • Fractions,
  • Subscripts,
  • Superscripts

In [26]:
print(superscripts.isnumeric())
print(five.isnumeric())
print(five_punjabi.isnumeric())
print(ten_hindi.isnumeric())
print(num_one.isnumeric())
print(one.isnumeric())
print(fractions.isnumeric())


True
True
True
True
True
False
True

In [ ]:


In [36]:
print(superscripts.isalnum())
print(five.isalnum())
print(five_punjabi.isalnum())
print(ten_hindi.isalnum())
print(num_one.isalnum())
print(one.isalnum())
print(fractions.isalnum())
ten_One = "10 One"
print(ten_One.isalnum())
tenOne = "10One"
print(tenOne.isalnum())
print("one".isalnum())
print("thirteen".isalnum())


True
True
True
True
True
True
True
False
True
True
True

In [ ]:

case-insensitive string comparison

for ASCII strings

In [39]:
string1 = 'Hello'
string2 = 'hello'

if string1.lower() == string2.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")


The strings are the same (case insensitive)
for unicode strings

In [49]:
str_lower = "Σίσυφος"
str_upper = "ΣΊΣΥΦΟΣ"

if str_upper.lower() == str_lower.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")


The strings are the same (case insensitive)

but fails in some cases


In [54]:
str_lower = "ß"
str_upper = "SS"

if str_upper.lower() == str_lower.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")


The strings are not the same (case insensitive)

So the best bet is using casefold. Lets replace lower to casefold in the above example


In [57]:
str_lower = "ß"
str_upper = "SS"

if str_upper.casefold() == str_lower.casefold():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")


The strings are the same (case insensitive)

String Module


Various functions for dealing with text are implemented in the module string.


In [35]:
import string

# the alphabet
print(dir(string))


['Formatter', 'Template', '_ChainMap', '_TemplateMetaclass', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']

In [86]:
a = string.ascii_letters
print(a)


abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

In [83]:
# Shifting left the alphabet
b = a[1:] + a[0]
print(b)


bcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZa

In [84]:
print(b.__doc__)


str(object='') -> str
str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or
errors is specified, then the object must expose a data buffer
that will be decoded using the given encoding and error handler.
Otherwise, returns the result of object.__str__() (if defined)
or repr(object).
encoding defaults to sys.getdefaultencoding().
errors defaults to 'strict'.

In [34]:
print(string.digits)
print(string.hexdigits)
print(help(string.printable))


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-34-835cbfaaef6b> in <module>()
----> 1 print(string.digits)
      2 print(string.hexdigits)
      3 print(help(string.printable))

NameError: name 'string' is not defined

In [ ]:

Template

The module also implements a type called Template, which is a model string that can be filled through a dictionary. Identifiers are initialized by a dollar sign ($) and may be surrounded by curly braces, to avoid confusion.

Example:


In [91]:
import string

# Creates a template string
st = string.Template('Dated: $when\n$warning occurred in $when $$$what $$what.')

# Fills the model with a dictionary
s = st.substitute({'warning': 'Lack of electricity',
    'when': 'April 3, 2002',
    'what': 'EOM'})

# Shows:
# Lack of electricity occurred in April 3, 2002
print(s)


Dated: April 3, 2002
Lack of electricity occurred in April 3, 2002 $EOM $what.

In [1]:
# Unicode String 
u = u'Hüsker Dü'
# Convert to str
s = u.encode('latin1')
print (s, '=>', type(s))

# String str
s = 'Hüsker Dü'
# u = s.decode('latin1')

print (repr(u), '=>', type(u))


b'H\xfcsker D\xfc' => <class 'bytes'>
'Hüsker Dü' => <class 'str'>

To use both methods, it is necessary to pass as an argument the compliant coding. The most used are "latin1" "utf8".

References